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Abstract. The complexity of maximal likelihood decoding of the Reed- 
Solomon codes [q — 1, k\ q is a well known open problem. The only known 
result \i. in this direction states that it is at least as hard as the discrete 
logarithm in some cases where the information rate unfortunately goes 
to zero. In this paper, we remove the rate restriction and prove that the 
same complexity result holds for any positive information rate. In partic- 
ular, this resolves an open problem left in [4], and rules out the possibility 
of a polynomial time algorithm for maximal likelihood decoding problem 
of Reed- Solomon codes of any rate under a well known cryptographical 
hardness assumption. As a side result, we give an explicit construction 
of Hamming balls of radius bounded away from the minimum distance, 
which contain exponentially many codewords for Reed-Solomon code of 
any positive rate less than one. The previous constructions in 2 7 only 
apply to Reed-Solomon codes of diminishing rates. We also give an ex- 
plicit construction of Hamming balls of relative radius less than 1 which 
contain subexponentially many codewords for Reed-Solomon code of rate 
approaching one. 



1 Introduction 

Let F g be a finite field of q elements and of characteristic p. A linear error- 
correcting [n, k] q code is defined to be a linear subspace of dimension k in F™. Let 
D = {xi, ■ ■ ■ , x n } C F q be a subset of cardinality \D\ = n > 0. For 1 < k < n, 
let / run over all polynomials in F g [a;] of degree at most k — 1, the vectors of 
the form 

(M--,/MeF; 1 

constitute a linear error-correcting [n, k] q code. If D = F*, it is famously known 
as the Reed-Solomon code. If D = F q , it is known as the extended Reed-Solomon 
code. We denote them by RS q [q— 1, k] and RS q [q, k] respectively. We simply call 
it a generalized Reed-Solomon code if D is an arbitrary subset of F g . 



Remark 1. In some code theory literature, RS q [q — 1, k] is called primitive Reed- 
Solomon code, and a generalized Reed-Solomon code [n, k] q is defined to be 

{(yi/(£i), ■ • • , Vnf{x n ))\f € F q [x],deg(j) < k}, 

where y\, y%, ■ ■ ■ , y n are nonzero elements in F g . 

The minimal distance of a generalized Reed-Solomon [n, k] q code is n — k + 1 
because a non-zero polynomial of degree at most k — 1 has at most k — 1 zeroes. 
The ultimate decoding problem for an error-correcting [n, k] q code is the maximal 
likelihood decoding: given a received word u £ F™, find a codeword v such 
that the Hamming distance d(u, v) is minimal. When the number of errors is 
reasonably small, say, smaller than n — \fnk, then the list decoding algorithms of 
Gurus wami- Sudan gives a polynomial time algorithm to find all the codewords 
for the generalized Reed-Solomon [n, k] q code. 

When the number of errors increases beyond n—^fnk, it is not known whether 
there exists a polynomial time decoding algorithm. The maximal likelihood de- 
coding of a generalized Reed-Solomon [n, k] q code is known to be NP-complete 
[B]. The difficulty is caused by the combinatorial complication of the subset D 
with no structures. In fact, there is a straightforward way to reduce the subset 
sum problem in D to the deep hole problem of a generalized Reed-Solomon code, 
which can then be reduced to the maximal likelihood decoding problem [3] . Note 
that the subset sum problem for D C F g is hard only if \D\ is much smaller than 

In practical applications, one rarely uses the case of arbitrary subset D. The 
most widely used case is when D = F* with rich algebraic structures. This 
case is essentially equivalent to the case D = F q . For simplicity, we focus on 
the extended Reed-Solomon code RS q [q, k] in this paper, all our results can be 
applied to the Reed-Solomon code RS q [q — l,k] with little modification. The 
maximal likelihood decoding problem of RS q [q, k] is considered to be hard, but 
the attempts to prove its NP-completeness have failed so far. The methods in 
|6J [3 can not be specialized to RS q [q, k] because we have lost the freedom to 
select D. The only known complexity result [4] in this direction says that the 
decoding of RS q [q,k] is at least as hard as the discrete logarithm in F* h for h 
satisfying 

_ 1 f~ _ i _ 2 

h < ^/q - k,h < q?+^ + 1 and h < — 4 ^_ ^ — 

for any e > 0. The main weakness of this result is that ^fq has to be greater 
than k, which implies that the information rate k/q goes to zero. But in the real 
world, we tend to use the Reed-Solomon codes of high rates. Our main result of 
this paper is to remove this restriction. Precisely, we show that 

Theorem 1. For any c £ [0, 1] , there exists an infinite explicit family of Reed- 
Solomon codes 



{RS qi [q!,ki],RS q2 [q2,k 2 ], ■ ■ ■ ,RS qz [quh], ■ ■ ■} 



with Qi = 0(i 2 log 2 i) and ki = (c+o(l))qi such that if there is a polynomial time 
randomized algorithm solving the maximal likelihood decoding problem for the 
above family of codes, then there is a polynomial time randomized algorithm solv- 
ing the discrete logarithm problem over all the fields in {F^i , F ' q h 2 , • • • , >•■•}» 

where hi is any integer less than q\^ +o1 ^ . 

The discrete logarithm problem over finite fields is well studied in computa- 
tional number theory. It is not believed to have a polynomial time algorithm. 
Many cryptographical protocols base their security on this assumption. The 
fastest general purpose algorithm pQ solves the discrete logarithm problem over 
finite field F* h in conjectured time 

exp(0((\ogq h ) 1/3 {\og\ogq h ) 2/3 )). 

Thus, in the above theorem, it is best to take hi as large as possible (close to 
1/4+0(1)^ orc j er £ or discrete logarithm to be hard. If h — q 1 / 4 +°( 1 ) j this 
complexity is subexponential on q. The above theorem rules out a polynomial 
time algorithm for the maximal likelihood decoding problem of Reed-Solomon 
code of any rate under a cryptographical hardness assumption. 

Our earlier paper [3] proved the theorem for c = (in that case we have 
hi < <jl'/ 2+0 ( 1 ) ). In this paper, we shall be concentrating on < c < 1. The 
results in this paper are built on the methods and results of our earlier paper. We 
shall show that the case c = I follows from the case c = by a dual argument. 
The main new idea for the case < c < 1 is to exploit the role of subfields 
contained in F q . Assume that q — q 2 and h = q 1 / 4 +°( 1 ) is a positive integer. We 
have Fg C F q C F qh . Let a be an element in F qh such that F q [a] = F q [a] — F qh . 
We observe that if every element in F q h can be written as a product of g\ many 
distinct a + a with a £ Fq, then for any nonnegative integer 92 < q — q, every 
element in F q h can be written as a product of g\ + 52 many distinct a + a with 
a G F q . This observation enables us to prove the main technical lemma that for 
any constant < c < 1, any element in F q h can be written as a product of [cq\ 
distinct factors in {a + a\a £ F q } for q large enough. 

By a direct counting argument, for any positive integer r < q — k, there exists 
a Hamming ball of radius r containing at least (^) /q q ~ r ~ k many codewords in 
Reed-Solomon code RS q [q, k] . Thus, if k = [cq\ for a constant < c < 1, we 
set r = \_q — k — q 1 ^ 4 \ and the number of code words in the Hamming ball will 
be exponential in q. However, finding such a Hamming ball deterministically is 
a hard problem. There are some work done on this problem [7] [2] , but all the 
results are for codes of diminishing rates. Our contribution to this problem is to 
remove the rate restriction. 

Theorem 2. For any c £ (0, 1), there exists a deterministic algorithm that given 
a positive integer i, outputs a prime power q, a positive integer k and a vector 
v £ F q q such that 



— q = 0(i 2 log 2 i) and k = (c + o(l))q, and 



— the Hamming ball centered at v and of radius q — k — g 1 / 4 +°( 1 ) contains 
exp(S7(q)) many codewords in RS q [q,k], and 

— the algorithm runs in time i ^. 

In our construction, the ratio between the Hamming ball radius q — k — 
^1/4+0(1) an( j minimum distance q — k + 1, which is known as the relative 
radius of the Hamming ball, is approaching 1. The same problem was encoun- 
tered in 1 7 2 , where there is the further restriction that the information rate 
goes to zero. In contrast, the above theorem allows the information rate to be 
positive. The following result shows that we can decrease the relative radius to 
a constant less than 1 if we work with codes with information rate going to one. 

Theorem 3. For any real number p G (2/3, 1), there is a deterministic algo- 
rithm that, given a positive integer i, outputs a prime power q — i ^, a positive 
integer k = q — o(^fq) and a vector v € F q q such that the Hamming ball cen- 
tered at v and of radius [p(q — k + 1)] contains at least q l many codewords in 
RS q [q,k}. The algorithm has time complexity i°^ . Note that the information 
rate is 1 — o(l). 

It would be interesting for future research to extend the result to all p G 
(1/2, 1), and to prove a similar result with the information rate positive and the 
relative radius less than 1. 

Given a real number p G (0,1), the codes where some Hamming ball of 
relative radius p contains superpolynomially many codewords are called p-dense. 
It was known in [5] how to efficiently construct such codes for any p G (1/2, 1), 
but finding the center of such a Hamming ball in deterministic polynomial time 
is an open problem. In this paper, we solve this problem if the relative radius 
falls in the range (2/3,1) using Reed-Solomon codes of rate approaching one. 
This result derandomizes an important step in the inapproximability result for 
minimum distance problem of a linear code in [5] . To completely derandomize the 
reduction there, however, one needs to find a linear map from a dense Hamming 
ball into a linear subspace. This is again an interesting future research direction. 

2 Previous work for rate c = 

For reader's convenience, in this section, we sketch the main ideas in our earlier 
paper [3]. This will be the starting point of our new results in the present paper. 

Let h > 2 be a positive integer. Let h{x) be a monic irreducible polynomial in 
F q [x] of degree h. Let a be a root of h(x) in an extension field. Then, F q [a] — F q h 
is a finite field of q h element. We have 

Theorem 4. Let h < g < q be positive integers. If every element of F* h can 
be written as a product of exactly g distinct linear factors of the form a + a 
with a G F q , then the discrete logarithm in F* h can be efficiently reduced in 

random time q°^ to the maximal likelihood decoding of the Reed-Solomon code 
RS q [q,g - h]. 



Proof. In [3], the same result was stated for the weaker bounded distance de- 
coding. Since the specific words used in [3] have exact distance q — g to the 
code RS q [q,g — h], the bounded distance decoding and the maximal likelihood 
decoding are equivalent for those special words. Thus, we may replace bounded 
distance decoding by the maximal likelihood decoding in the above statement. 
We now sketch the main ideas. 

Let h(x) be a monic irreducible polynomial of degree h in F g [a;]. We shall 
identify the extension field F q h with the residue field F q [x] / (h(x)) . Let a be the 
class of x in F q [x] / (h(x)) . Then, F q [a] = F q h. Consider the Reed-Solomon code 
RS q [q,g — h]. For a polynomial f(x) e F q [x] of degree at most h — 1, let Uf be 
the received word 



By assumption, we can write 



aeF„ 



/(a) = n( a + ai )' 



i=l 

where at € F q are distinct. It follows that as polynomials, we have the identity 

9 



}~|(x + ai ) = f(x)+t(x)h{x), 



i=l 



where t(x) € F q [x] is some monic polynomial of degree g — h. Thus, 

where t(x) — x 9 ~ h £ F q [x] is a polynomial of degree at most g — h — 1 and 
thus corresponds to a codeword. This equation implies that the distance of the 
received word Uf to the code RS q [q,g — h] is at most q — g. If the distance is 
smaller than q — g, then one gets a monic polynomial of degree g with more than 
g distinct roots. Thus, the distance of Uf to the code is exactly q — g. 

Let Cf be the set of codewords in RS q [q,g — h] which has distance exactly 
q — g to the received word u f . The cardinality of Cf is then equal to Jj times 
the number of ordered ways that }(pt) can be written as a product of exactly 
g distinct linear factors of the form a + a with a e F q . For error radius q — g, 
the maximal likelihood decoding of the received word Uf is the same as finding 
a solution to the equation 

a 

/(") = II( a + ai )' 



where at G F q being distinct. 

To show that the discrete logarithm in F* h can be reduced to the decoding 
of the words of the type Uf, we apply the index calculus algorithm. Let b(a) be 



a primitive element of F* h . Taking f(a) = b{a) 1 for a random < i < q h — 2, 
the maximal likelihood decoding of the word uj gives a relation 

g 

b(aY = Y[(a + aj (i)), 
3=1 

where cij(i) € F q are distinct for 1 < j < g. This gives the congruence equation 

g 

i = ^2\og b[a) (a + a 3 (i)) (mod q h - 1). 

3=1 

Repeating the decoding and let i vary, this would give enough linear equations 
in the q variables log^-^a + a) (a £ F q )). Solving the linear system modulo 
q h — 1, one finds the values of log fc ( Q )(a + a) for all a 6 F,. To compute the 
discrete logarithm of an element v(a) S F* h with respect to the base b(a), one 
applies the decoding to the element v(a) and finds a relation 

9 

v(a) = Y[(a + bj), 

3 = 1 

where the bj £ F q are distinct. Then, 

9 

l0g 6 (a) v ( a ) = l0 Sb(")( a + 6 3') ( m ° d Q k ~ 

3=1 

In this way, the discrete logarithm of v(a) is computed. The detailed analysis 
can be found in [3]. □ 
The above theorem is the starting point of our method. In order to use it, 
one needs to get good information on the integer g satisfying the assumption of 
the theorem. This is a difficult theoretical problem in general. It can be done 
in some cases, with the help of Weil's character sum estimate together with a 
simple sieving. Precisely, the following result was proved for g in [3] • 

Theorem 5. Let h < g be positive integers. Let 



N(g,h) = - 
9 

Then every element in F* h can be written in at least N(g, h) ways as a product 
of exactly g distinct linear factors of the form a + a with a G F q . 
If for some constant e > 0, we have 

q > max(.g 2 , (h - l) 2+e ), g > + 2)(h + 1), 

then 

N(g,h)>q^ 2 /g\>0. 

The main draw back of the above theorem is the condition q > g 2 which 
translates to the condition that the information rate (g — h)/q goes to zero in 
applications. 



3 The result for rate c = 1 



Now we show that Theorem Q] holds when information rate approaches one. 

Proposition 6 Let g, h be positive integers such that for some constant e > 0, 
we have 

q > max(. 9 2 , (h - l) 2+e ), g > (- + 2){h + 1). 

e 

Then, every element in F* h can be written in at least N(g,h) ways as a product 
of exactly q — g distinct linear factors of the form a + a with a £ F q . 

To prove this proposition, we observe that the map that sends f3 £ F* h to 
IlaeF, ( a + a )/P 1S one-to-one from F* (1 to itself. 
Proof: Note that 

n (« + °) + °- 

Given an element (3 £ F* h , from Theorem [SJ we have that IlaeF ( a + a )/P 
can be written in at least N(g, h) ways as a product of exactly g distinct linear 
factors of the form a + a with a £ F q , hence (3 can be written in at least N(g, h) 
ways as a product of exactly q — g distinct linear factors of the form a + a with 
OGF,. □ 
It follows from Theorem @] that we have the following two results. 

Proposition 7 Suppose that 

q > max( 5 2 , (h - l) 2+e ), g > + 2){h + 1). 

Then the maximal likelihood decoding RS q [q, q — g — h] is as hard as the discrete 
logarithm over the finite field F* h . 

Note that the rate (q — g — h)/q approaches 1 as q increases for g = O(yfq) 
and h = 0(g) = 0(^q). 

Proposition 8 Suppose that 

q > max( 5 2 , (h - l) 2+e ), 9>(- f + 2)(ft + 1). 

Let h(x) be an irreducible polynomial of degree h over F q and let f(x) be a 
nonzero polynomial of degree less than h over F q . Then in Reed-Solomon code 
RS q [q, q — g — h], the Hamming ball centered at + a/ q ~ 9 ~ h )a£F q of radius g 
contains at least many codewords. 

Note if we set g = \y/q] , then the number of codewords is greater than 2^, 
which is subexponential. 



Proof of Theorem [3| The relative radius of the Hamming ball in the 

g 

g+h+i 

_ 2e+< 

+3 3e+4' 

2e + 4 



above proposition is + f l+1 ■ If 9 = [~(~ + 2)(h + 1)] , then the relative radius is 
approaching to -f—^ = |£±| . Select e such that 



3e + 4 



Note that e can be large if p is close to 2/3. If <? = |~a 2+E ~|, the number of 
codewords is at least 

To make sure that this number is greater than q l , we need g > H£2±£)l. Jt is 
satisfied if we let q to be the least prime power which is greater than 

( 2 ( 2 + £ )^ 2+e _ ^O(l) 

We then calculate g = \q*+* 1 \ and solve h from the equation g = |~(| + 2)(/i+l)~|. 
Finally we find an irreducible polynomial h(x) of degree h over F 9 using the 
algorithm in [9]. □ 



4 The result for rate < c < 1 

We now consider the positive rate case with < c < 1. For this purpose, we take 
q = q™ with m > 2. Let a be an element in F q h with F qi [a] = F q h. Since 

F qi [a] CF q [a]CF qh , 

we also have F q h = F q [a] . 

Theorem 9. Let q = q\ n with m > 2. Let g\ and gi be non-negative integers 
with g2 < q — q\ ■ Let 

- i i^P^ - (i + w) (•;« 

Then, every element in F* h can &e written in at least N(gi, g 2 , h,m) ways as a 
product of exactly gi + g 2 distinct linear factors of the form a + a with a £ F q . 
If for some constant e > 0, we have 

qi > max( 3 J, (mh - l) 2+e ), gx>{- + 2){mh + 1) 

e 

then 

N(g u g 2 ,h,m)>^( q ~ qi ) > 0- 
5i! V 92 J 



Proof. Since g 2 < q — qi , we can choose g 2 distinct elements 61, • • • , b 32 from 
the set F q — F qi . For any element (3 e F* h = F* mh , since F 9l [a] = F q ^h , we can 
apply Theorem 2.2 to deduce that 

(a + ai) ■ ■ ■ (a + a gi ), 



(a + bi) ■ ■ ■ (a + b 92 ) 



where the at € F 9l are distinct. The number of such sets {a\, a 2 , as, ■ ■ ■ , a gi } C 
F qi is greater than 



9I !V "Hir-""' j- 

Since F gi and its complement F q — F qi are disjoint, it follows that 

/3 = (a + 61) • • • (a + b g2 )(a + ai) ■ ■ ■ (a + a gi ) 

is a product of exactly gi + g 2 distinct linear factors of the form a + a with 
aeF q . □ 
We now take g\ = [g 1//2m J = [y/qi\ and g 2 = [cq\ —51 in the above theorem. 
Thus, gi + g 2 = [cq\. We need g 2 satisfying the inequalities 

0<g 2 <q- qi =q- q 1/m . 

That is, 



< [cq\ - [q 1/2m \ <q-q X 



jra 



The left side inequality is satisfied if q\ > c 2 /( 2m x ). The right side inequality 
is satisfied if qi > (1 — cp 1 ^™" 1 ). Thus, we obtain 

Theorem 10. Let m > 2 and h > 2 be two positive integers such that q = q" 1 . 
Let < c < 1 be a constant such that 

qi > max((m/i - l) 2+e , (- + 2)(mh + l) 2 , c^r , (1 - c)^) 

for some constant e > 0. Then, every element in F* qh can be written as a product 
of exactly [cq\ distinct linear factors of the form a + a with a G F q . 

Combining this theorem together with Theorem 2.1, wc deduce 

Theorem 11. Let ra > 2 and h > 2 be two positive integers such that q = q™. 
Let < c < 1 be a constant such that 

qi > max((m/i - l) 2+e , (- + 2)(mh + if, c^t, (1 - c)^) 

for some constant e > 0. Then, the maximal likelihood decoding of the Reed- 
Solomon code RS q [q, [cq\ —h] is at least as hard (in random time q°^ reduction) 
as the discrete logarithm in F* h . 



Taking m = 2 in this theorem, we deduce Theorem 1.1. 



Proposition 12 Let h be a positive integer and < c < 1 be a constant. Let qi 
be a prime power such that 

Ql > max((2h - l) 2+e , (- + 2)(2h + l) 2 , cT 2 / 3 , (1 - c)" 1 ) (1) 
e 

for some constant e > 0. Let q = q\. Let h(x) be an irreducible polynomial 
of degree h over F q whose root a satisfies that F qi [a] — F q h. Let f(x) be a 
nonzero polynomial over F q of degree less than h. Then in the Reed-Solomon 
code RS q [q, [cq\ —h], the Hamming ball centered at (j^+ a ^ cq ^ h )aeF q of radius 
q — [cq\ contains at least exp(0(q)) many codewords. 

Proof: The number of codewords in the ball is greater than 

LV^TJ! \\cq\-y/Er 

which is greater than (^^j^ 9 ^--) = exp(0(q)). □ 

Proof of Theorem [2j Let q to be the square of the i-th prime power 
(listed in increasing order). Assume that i is large enough such that yfq > 
max(c -2 / 3 , (1 — c) _1 ). We then let e to be 1/logg and h to be the largest integer 
satisfying JTJ). It remains to find an irreducible polynomial of degree h over F 9 , 
whose root a satisfies that F qi [a] = F^h. Let p be the characteristic of F q . We 
can use a such that F p [a] — F q h . We need to find an irreducible polynomial of 
degree hlog p q over F p . It can be done in time polynomial in p and the degree 
[9]. Then we factor the polynomial over F q and take any factor to be h(x). As 
for f(x), we may simply let f(x) = 1. □ 

5 Conclusion and future research 

In this paper, we show that the maximal likelihood decoding of the Reed- 
Solomon code is at least as hard as the discrete logarithm for any given infor- 
mation rate. In our result, we assumed that the cardinality of the finite field is 
not a prime. While this is not a problem in practical applications, e.g. q = 256 is 
quite popular, it would be interesting to remove this restriction, that is, allowing 
prime finite fields as well. 

Many important questions about decoding Reed-Solomon codes remain open. 
For example, little is known about the exact list decoding radius of Reed-Solomon 
codes. In particular, does there exist a Hamming ball of relative radius less than 
one which contains super-polynomial many codewords in Reed-Solomon codes 
of rate less than one? 
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